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¥ TO investigate the causes of variation in 
cloze-comprehension test correlations, a reanalysis was conducted of 


‘the influential J.R. Bormuth study (1962), which reported 
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correlations between nine cloge and nine comprehension tests 
administered to 50 subjects in grades 4, 5, and.6.- Two separate , 
renalayses’ were completed in the present study, the first using 
cloze-comprehension correlations as a dependent measure.and the 
second using correlations of cloze with each of ‘the Bormuth question 
catagories--vocabulary, factual recall, sequential.order, cause and 
effect, inference, and author's purpose-~at each grade level. Results 
indicated that in both reanalyses, variable reliability: ‘and grade 
level differences accounted for some of the variation in 
correlations. The proportion of comprehension test questions found to 
assess within-senténce information was found to be significant in the 
first regréssion analysis but not in the second. Results -suggested 
that the limited construct validity of cloze as a measure of 


‘comprehension had a small but significant influence on the concurrent 


validity of cloze with measures of comprehension. Cloze predicted © 
performance best when within-sentence comprehension dominated, but 


‘ was less useful when students were expected to integrate information. 


across sentence boundaries. (MM) 
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‘ Cloze tests have traditionally been used for two Purposes: to assess com- 


prehension and to establish reading levels for Instruction. However, recent 

cloze research (Kibby, 1981; Shanahan & kant a 1982s Shanahan & Kamit... in 

'press;: ‘Shanahan, Kamil, & Tobin, 1982) fae demonstrated. that cloze has tow 

: construct validity as,a test of reading comprehens ign. Specifically, shore ‘is 
. ‘ insensitive to the integration.of information across sentence boundaries. 


Thus, the use of. cloze to measure comprehension Should be viewed with suspi- 


at e@ 


YQ clon. - 0 Ss ’ 
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y alate is used to establish reading levels because it has pare found to have 
iy _ high concurrent Walla yey not because of any claims about its construct valid- 
\ . 3 ewe That is, cloze has ‘eeen proposed as an attractive ‘alternative to more 
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‘expensive (in terms of. tire.and effort) ‘approaches to reading assessment. ~ 
eS . because it has-been found to be significantly correlated with such measures 
ee (Bickley, Ellington,.& Bickley, 1970; McKenna & Robinsons ia Rankin, 1965). a. By 
ic e ae o 4 “o : ‘ ae ie 
‘ The ‘correlations of cloze scores with “reading conprehens ios scores usually 
a : range from-about .40 to 80, across 4 variety of toate, pasniaciend etc. “ 


_  Cloze actually explains ' as little as 16% to as much as 65% of reading compre- 


° UJ a 


hension test: score variance. Al though thine correlations are usually signif.i- 

a “ ‘ : ' y 2 ° 
cant, this range of values suggests that cloze is more appropriate under cer- . 
tain text or measurement condi tions than it is -under others. The purpose of x 
‘this papehleixe identity sources of variation ina ‘it of, cloze-comprehension — . 


test’ correlations from a classical study’ of concurrent validity (Bormuth, 


a 
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1962) . | ; 
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: Part of Plas variation in the GIRTET ROMP ANE TIS LON correlations is mINCKS 


eortainly due to att CRRaDees in the reliabilities of the measures used in the 


jFesearch” The: iene the reliabilities of either the cloze or the comprehen- 


sion tests, the more attenuated the correlations among these tests. Since ‘the 
- i - os q ; 
concurrent validity studies have not used equally reliable measures variation . ee 


in correlations is to be expected, and, therefore, some amount of that varia~ 


‘ tion is explained by differences in the reliabilities. 


, : 
an - 
| ‘ « t * eo 


_ Another possible source of variation across studies is sample differences. 
Hm Cloze and comprehension measures migit be more. correlated for some groyps of ie 


subjects than for others. The use of subjects drawn from different grade lev- * 


els, for example, could lead to differences in correjations. Again, the con- 
: ‘ 
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. __ trol'of such differences should limit the variation in the reported correla- 
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. third coeyibib ‘source of variation in the correlations ss imposed by the 


Fr , 


limited construct validity of the cloze procedure. compreheoxton test ques~ ee 


 thons can be written so as to inal the usezof information from either te x 4 
‘ . Lert oe 


“withing or across-sentence boundaries (Bormuth Carr, _ndgning, & Pearson, 


. Ms 
1970). nee that cloze is largely insensitive to the use of: cross-sentence es 


rateemions, it seems reasonable to ‘expect cloze to . be best. correlated with 


. 


rents consisting of large prehorélons of within-sentence items. scsip: gn the | 


: between pi thin sentence distinction was not made ia comprehension tests until 


* have been tapping the same Porteof ae eee measured by elozes 


° *s 
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recently, the proportions of within- and across-sentence items were uncont~ 


é - 


rolled. Thus, the. reading tests used. my ‘concurrent validity studies may not ° 


cs ‘ ¢ - ¥ 
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To. examine. some’ of these issues we attemated to re-analyze the data from. 


® 
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‘ 


the ent antral Bormuth (1962) ‘Study. This data set is reasonable to use to 


» 
explore the issues discussed here, because it répresents one of the most ambi- 


- 
‘ ‘ 


tious attempts to demonstrate the concurrent validity of cloze. Bormuth col- Ly 
lected: cloze test and comprehension test data from & reasonably large sample 


> 


of subjects at three different grade levels; he reported reliabilities for the 


various measures:used in the study; Tre made available the cloze passages and 
te “y = 


~ comprehension test questions; and, he reported more thary 200 separate correla- 


“a 


-tion coefficients of cloze with speprehanatan,. Specifically, this study re- 


: examined the Bormuth data ‘to determine whether we can account for the concur- 


rent valldtey of cloze in terms of the types of questions used to assess com- 


4 


prehension. Of interest is the wi thtn-sentence ausEE ION variable. drawn from 


the construct ‘validity iteratura: The variables of-reliability and grade 


g ‘ . 
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level were included also. © 4 : ; 
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Be ot ot ‘ This is an archival study. It provides additional analyses of historically 


important data. These data, because they ware collected by. a different 


researcher (Bormuth, 1962), for different purposes fron thake of the present . 


é 4 


authors, impose certain limitations upon the ices “Nevertheless, ‘itis. 


the authors’ contention that it is reasonable to reanalyze the Bormuth data so 


‘as to determine whether a soeerertea) construct (e.g., intersentential infor- 
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mation integrag ion) can explain extant research results. This reanalysis: is 


not intended as a criticism of the Bormuth research, but rather it is an: 
; attempt to extend and explain his findings in a theory-relevant manner. . - 


* © 


Bormuth's Sample .- 


~ 
° 
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Bormuth collected complete data on ‘three. grade level samples ‘of 50 sub- 4? 
. s a 1 ‘ ) 
a jects. These subjects were attending grades four, five, and six in schools in 
“indiana. . ; sane rae 
, : © \ 
Bormuth's Materials ong. .: 


“Nine passages of 275-300 words in ‘length were written for use in this 


study. These passages covered three different subject matters (stories, his- «. 
tory, gcience) and they were written at three levels of readability (fourth, 


fifth, sixth). These parteges were transformed into 5th-word deletion cloze, 
2 . » at 
" tests. Pe arsine comprehension quest long were written for each passage 


? 


to measure comprehension of vocabulary (12 i xene nee test); factual recall (7 . 


items) ; sequential order (2 items); cause and effect (4 items); inference th . 


- 
. 


items) ; and author's purpose (1 ftem). Each of the 31 test items consisted:of 
, : 5 


¢ a@ sentence stem and four multiple-choice sentence completion items. 


5 e 3 


~ 


. 


Borndth, Panys Manning and Pearson (1970) . Separately, | the present research- . 
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Bormuth's Procedures _ . ve ook ; : 
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An subjects completed the nine cloze tests and the nine comprehension 


. tests. Subjects completed the cloze test for a passage ere receiving the 


corresponding: intact passage.with the questions. Reliab lities of the tests 


. 


were determined using the rational equivalence method. Correlations iar be . 
reported between the comprehension tests and the cloze tests. This was done 
separately for each passage at each grade level, and for each question type 


across passages at each grade level. Additional information on the Bormuth 
\ ; 7 


data is available in. the original report. 
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Reanalysis | 
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Two separate reanalyses of the’ Bormuth data were completed. -In order to 


s 


E . £ ‘ . 
reanalyze the Bormuth. data it was necessary to identify the number of compre- 
‘hension, test items requiring st ehitite and - ‘across-sentence- information. 18 os 


accomplish this a set of rules was drafted based on the system reported by 


ers apeied these rules to all nine tests. Inter-rater reliability for the 


set of nine passages was Bie sithough some items (vocabulary, sequence, main - 


idea) were rated more ‘gonsd eten ly than were others (inference, factual). For 


© $ 


example, fhe a a of the sequential items was 1.00, while it was only ‘ 


775 for i ‘ference questions. Disagreements were resolved through discus- 
= . é - { 

sion. | ; uy ee : 
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The first reanalysis used the correlations of cloze with comprehension as a 


. , ¢ 
° r ; 3 


dependent measure. Bormuth reported ‘correlations for each cloze passage and ry és 
c : 


° 
. 
* 


its cqrresponding comprehension test for each‘grade level. Me all, 27 "suc 


correlations were reported ranging from .46 to .87. Log lines? trarisforma- 


tions. of the correlations were performed to put them in an appropriate fora 
> * 

for analysis. The log PESCETE PRAT ous of these correlations were | 

| 


through multiple regression anaiysis. The Serre een ote were regressed on the ° 
reliabilities of each of the. Leitay the grade Tevels of the subjects, and he 


‘ 


proportion of wi thin-sentence questions.- 


e 
~* 2 “a 


a A second analysis. used. the correlations of c.loze with each of the Bormuth. 
question categories lice, vocabulary, “factual recall, etc.) for each — 
level. The 189 correlations ranged from .ITto .92. The log transformations 
of these correlations were regressed on ina rad tabi titles of diiee bus senses 
Bormuth's question types categories (a categor-ical variable -indicating wiich 
of* the Bestages and which question types were being used) , grade nave and 
proportion of waten{areentence questions. The Bormuth question type var tebe 

s ; was added to this analysis because the reliabilities for question types were 


calculated across all nine passages, rather than for each separately. Thus,. 


_the reliability coefficients provided are only an‘ approximatic» of the reli- 


° “abilities of a particular set of dyestions.. Adding this categorical variable ‘ 
» 7 : eat 
to the analysis should reduce the error ‘introduced by these reliability estl= 
mates. , 
et Oe , Results on 2 Ve % 
) ‘ . 2 : ‘ 
The result the first regression indicate: that the reliability variables 


= 039, F (2,2h) = 7.51, p<.01). The 


comprehension test correlations (R? 
a A 


_ grade level variable accounts for an additonal 24 percent of the variance (F 
(1, 23) = 14.4), p<. 005). Finally, the proportion of wi thin-sentence questions 
gercunte for 8 percent of the correlation variance with the other independent_ : 


Vatiabing already in the regression (F (1,22) = 4,94, p<.01). 


> \% 3 
The second analysis, the regression of the correlations of scores -on. the 


various Bormuth question types with cloze scores, again found the reliabili- 


‘ties to account for a substantial amount of variance (R? = .66, F (2,186) = 


185.10, p<.0001) . Grade level: accounts for an additional 6 percent of: the 


variance (F (1,185) = 38.27, p<.0001). The Bormuth question type variable was: 


also. significant (F (2, 183) = 6. 36, p<. 05)» ai though it explained only 2 per- 


Ms “ 
cent of the variance. Proportion of wrehinmetiinice questions’ was’ then 


. 


4 + allowed to enter the regression. This variable accounted for less than | per- 
ga ni ae ; 


ithe variance ( Hy 182) = 2.56, N.S.). The within-sentence question suid 


variable was significantly related to the cloze-cemprenention ‘correlations (r 


cent: of 


= .19, p<.05), but its contribution dropped to almost néthing with me senaes 


‘ 


variables in the equation. 


: " Discussion and Conclusions ~_y- he Tas 


This’ study attempted to. account for the variation in the correlations 
between eloze’ scores and comprehens jon test scores. The results of reanalysis 
: ar the Bormuth data indicate that reliability and sampling differences - 
accounted for some of the variation in the correlations. However, these vari-° 
ables were not sufficient to account for all of the explainable variation. 


For this reason, the proportion of comprehension test questions found to 


assess within-sentence information was then used as an independent variable. 


“ 


Nevertheless, the results from the regression analyses were mixed. The 


within-sentence variable increased the explanation of cloze-comprehension cor- : 
oi 


‘ relat?ons significantly in one analysis, but not in the other. 


The differences in the findings could be due to a number of, factors. 
Reliabilities of the tests used in the two reanalyses ave measured more ‘ 
difectly in one analysis than in ae othens That is, in the first réanilyats 
‘they sbi based on the total ‘comprehension test scores. . fhe’ second, ‘they 
were based on the Bormuth question type scores, but the reliabilities for 


question types were computed across tests while the correlations were reported 


“within tests. To compensate for this, problem a Bormuth question-types vari- 


. 


able was used. THis variable was significantly correlated to the within- 
across variable under study.’ Also, the within-across distinction was made 


more reliably for some question types than for others. The error introduced 


‘ 


by this was balanced across tests in the first reanalysis, but it was sepa- ~ 


. 


rated and unequal in the other. Such differences could have led to different 


findings. 2 as — 
% 4, - 3 . E * e 
The findings suggest that the limited construct validity of cloze as a 
b v Kot aed 
measure of comprehension has a smal! but significant influence on the concur- 


rent validity of cloze with measures of ‘comprehension. ‘Cloze predicts per- 


formance best in those situations in which within-sentence comprehension domi- 


* nates. Cloze is less useful in’ situations in which students, are expected to 


‘ oe oft, i 
integrate information across sentence boundaries. thes? ia 
ese a re : : - z 
. Experimental research is mow,meeded tc explore these findings. Such 
a : = 


research could improve upon the design used here in a number of important 
s \ ° . * 
& . ; 


‘ 


x nd 


ways. First, the order of test edministration should be puiencaa, in order to 
reduce shurious increases in the correlations: attributable to order eftects. 

; 4: 

Second, duest ions could be designed specifically to assess within: and - 


ee 


acrogs-séntence information. Third, prior knowledge assessment should be mad: 
, . e e @ ¢ 


to deteréine the text dependence of the questions. If a question is answera-.- 
ble on the basis of prior. knowledge, then it is neither a within- nor a6 ; 
across-séntence item.. Fourth, questi on-answer items could be designed to 
replace Bornuth's cloze-like wantinke completion items. This would: permit a 
more accurate estimate of the cloze-compretiension correlations. Fifth, relic. 

abilities could be measured more accurately using Cronbach's alpha. sThis a : 
would réduce spurious increases, in the relationship of reliability with the 
cloze-comprehension correlations. A study. which deals with these prieb lens 

will be better able to account for the reasons that cloze is not always. wei! 


related to comprehension. ° eG 


until such a direct test of the influence of construct waliaiy upon con- : 
current validity is made, two cautions seem reasonable. Those who use, or | 
recompend the use of, cloze as a useful instructional placement device should 
"beware of the great aaah of variation in the relationship of cloze with com- 
prehension measures. This: suggests that cloze can be, but is often not, a » 
useful prediction device. Also, this enalysis indicates that cloze is; under , 
adil conditions, better related to within-sentence Comprehénsion demands than. 
to schaneraantehee eiiks it across. sentence, comprehension. Is important, then x 
chav isa possibli lity that cloze will be somewhat less effective as a pre- | 


dictor. ; , / 
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